Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor!: change model to output mel-band oriented tensors instead of time-oriented ones #94

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

roedoejet
Copy link
Member

@roedoejet roedoejet commented Oct 30, 2024

Copy link

semanticdiff-com bot commented Oct 30, 2024

Review changes with  SemanticDiff

Changed Files
File Status
  fs2/prediction_writing_callback.py  17% smaller
  fs2/cli/synthesize.py  16% smaller
  fs2/model.py  0% smaller

Copy link
Member

@joanise joanise left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, modulo some questions in the comments below.

assert "output" in outputs and outputs["output"] is not None
assert wavs.shape[0] == outputs["output"].size(
wavs.ndim == 3
), f"The generated audio did not contain 3 dimensions. First dimension should be B(atch) and the second dimension should be C(hannels) and third dimension should be T(ime) in samples. Got {wavs.shape} instead."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this happen due to a user error (like providing the wrong kind of input file), or is this strictly due to a programmer error? If the latter, OK, if the former, I don't like using assert.

basename=basename,
speaker=speaker,
language=language,
torchaudio.save(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the change of audio writer function related to this PR, or just an unrelated improvement? I assume you've tested and you can confirm this works well?

data[:unmasked_len]
.cpu()
.transpose(0, 1), # save tensors as [K (bands), T (frames)]
str(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We didn't use to need to cast this Path to a str, I wonder why you do now. In my PR #102, get_filename is factored out to the base class, we should have it do return str(path) in one place, in get_filename, instead of casting everywhere we use it.
Warning: Whichever PR is merged second will have to rebase and resolve conflicts over the use of get_filename.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rebased and fixed using my suggestion here

Copy link

codecov bot commented Dec 10, 2024

Codecov Report

Attention: Patch coverage is 66.66667% with 3 lines in your changes missing coverage. Please review.

Project coverage is 46.13%. Comparing base (2afc610) to head (9549082).

Files with missing lines Patch % Lines
fs2/model.py 0.00% 2 Missing ⚠️
fs2/cli/synthesize.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #94      +/-   ##
==========================================
- Coverage   46.24%   46.13%   -0.12%     
==========================================
  Files          22       22              
  Lines        1464     1461       -3     
==========================================
- Hits          677      674       -3     
  Misses        787      787              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants